Graphical EDA I: continous data

Team 1

04 January 2021

Structure

1. Examining Continuous Variables

2. Looking for Structure: Dependency Relationships and Associations

3. Investigating Multivariate Continuous Data

1. Examining Continuous Variables

Exercise 1. Galaxies

Solution

a) Histogram

Qplot automatically increases the bin size of the histogram, which shows a bimodal distribution with tails that increase on both sides of the histogram.

library(MASS)
library(ggplot2)
library(mclust)
## Package 'mclust' version 5.4.7
## Type 'citation("mclust")' for citing this R package in publications.
data(galaxies)
galaxies <- as.data.frame(galaxies)
names(galaxies) <- 'Velocity'
par(fig=c(0,1,0,1),new=T)
qplot(galaxies$Velocity) +
  labs(title='Histogram of Galaxy Velocity',
       x='Velocity of Galaxy',
       y='Frequency')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Density estimate

The density plot of the model shows three distinct superclusters with the far right tail not being as distinct.

library(mclust)
mod <- mod <- Mclust(galaxies$Velocity)
par(fig=c(0,1,0,1),new=T)
plot(mod,what="density")

c) Different plots:

In order to present all the information, I think we need at least 5 different plots to spot all the factors the data set can provide. Boxplot, histogram, rugplot, dotplot, they can all provide different informations.

Exercise 2. Boston housing

Solution

a) The histograms treats different atributes:

There are several different histogram forms, each telling a separate story. Default binwidths, dividing each variable’s range by 30, have been used. Other scalings could reveal more information and would be more interpretable. Is interesting that the vertical scales vary from maxima of 40 to over 400. Plotting histograms individually, choosing binwidths and scale limits are the main decisions to be taken.

b) ZN and BLACK variable’s boxplots might be better they make such an efficient use of the space available.

Exercise 3. Student survey

Solution

data(survey, package="MASS")
par(fig=c(0,1,0,1),new=T)
hist(survey$Height,
     xlab = 'Height',
     main = 'Histogram of Student`s Height',
     ylab = 'Frequency')

plot(survey$Height,what="density")

b) Examination of national survey data on young adults shows that the separation between the distributions of men’s and women’s heights is not wide enough to produce bimodality.

Exercise 4. Movie lengths

Solution

library(ggplot2)
data(movies, package="ggplot2movies")
par(fig=c(0,1,0,1),new=T)
hist(movies$year[movies$length == 90 | movies$length == 7],
     xlab = 'Year',
     main = 'Histogram of Number of movies after 1980',
     ylab = 'Nr.')

a) The histogram shows that we have the peaks of 7 minutes or 90 minutes length for both periods: before 1980 and after 1980.

  1. In order to classify a movie as short or as long, I think that using denisity estimation is a good ideea. We can set the limit length for a movie to be classified as short.

Exercise 5. Zuni educational funding

Solution

a) Histogram or boxplot?

table

library(lawstat)
data(zuni, package="lawstat")
par(fig=c(0,1,0,1),new=T)
hist(zuni$Revenue,
     xlab = 'Revenue',
     main = 'Revenue Histogram',
     ylab = 'Nr.')

I prefer a histogram for showing 5% the lowest and the highets.

b) Density estimation:

mod <- mod <- Mclust(zuni$Revenue)
par(fig=c(0,1,0,1),new=T)
plot(mod,what="density")

Exercise 6. Non-detectable

Solution

There is no “h39b.W1” attribute on CHAIN variable because it has been renamed in “log_virus”. For both cases I would use a histogram because I can easily see the number for each case.

library(mi)
## Loading required package: Matrix
## Loading required package: stats4
## mi (Version 1.0, packaged: 2015-04-16 14:03:10 UTC; goodrich)
## mi  Copyright (C) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Trustees of Columbia University
## This program comes with ABSOLUTELY NO WARRANTY.
## This is free software, and you are welcome to redistribute it
## under the General Public License version 2 or later.
## Execute RShowDoc('COPYING') for details.
library(ggplot2)
data(CHAIN)
par(fig=c(0,1,0,1),new=T)
hist(CHAIN$log_virus,
     xlab = 'Case',
     main = 'Histogram of virus cases with 0s',
     ylab = 'Nr.')

library(mi)
library(ggplot2)
data(CHAIN)
par(fig=c(0,1,0,1),new=T)
hist(CHAIN$log_virus[CHAIN$log_virus != 0],
     xlab = 'Case',
     main = 'Histogram of virus cases without 0s',
     ylab = 'Nr.')

Exercise 7. Diamonds

Solution

a) Diamond Weight

A diamond’s weight can be found in “carat” attribute. Let’s see how can we see

library(ggplot2)
data(diamonds)
ggplot(diamonds, aes(x=carat, y=price)) + geom_point()

I wanted to put in balance the weight of a diamond with it’s price. Aparently the most expensive diamonds’s weight is between 1,5 and 3 grams. Some of the most cheapest diamonds have weight the least.

b) Distribution of prices

data(diamonds, package="ggplot2")
par(fig=c(0,1,0,1),new=T)
hist(diamonds$price,
     xlab = 'Price',
     main = 'Histogram of Diamonds Prices',
     ylab = 'Frequency')

For the distribution of Diamond Prices, I chose a histogram. I think it is very easy to understand looking at this histogram that the most expensive diamonds are the fewest. I think a factor that the most expensive diamonds are the fewest is that those diamonds are very rare and very hard to find. Another factor is that it requires more work than the others.

2. Looking for Structure: Dependency Relationships and Associations

Exercise 1. Movie ratings

Figure 5.7

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data(movies, package = "ggplot2movies")
 ggplot(movies, aes(votes, rating)) + geom_point() + ylim(1, 10)

(a) Excluding all films with fewer than 100 votes:

library(ggplot2)
library(dplyr)

data(movies, package = "ggplot2movies")

filtered <- filter(movies, votes > 100)
ggplot(filtered, aes(votes, rating)) + geom_point() + ylim(1, 10) 

(b) Excluding films with average rating greater than 9 and also the ones that have more than 100000 votes:

library(ggplot2)
library(dplyr)

data(movies, package = "ggplot2movies")
#summary(movies)
filtered2 <- filter(movies, rating < 9) #| votes>100000 )


ggplot(filtered2, aes(votes, rating)) + geom_point() + ylim(1, 10) 

Exercise 2. Meta analysis (Olkin95 dataset)

(a) Number of observations in each experimental group (n.e) against the corresponding number of observations in each control group (n.c):

library(meta)
## Loading 'meta' package (version 4.15-1).
## Type 'help(meta)' for a brief overview.
data(Olkin1995)

ggplot(Olkin1995, aes(n.exp, n.cont)) + geom_point()

# summary(metabin(ev.exp, n.exp, ev.cont, n.cont, data = Olkin1995))
#print(Olkin1995)

Observations: - there is a linear relationship between the variables; - there are 4 outliers for higher values of the both variables (2500, 6000, 8500); - there are several gaps in the dataset, between the outliers previously mentioned; - there seems to be some overplotting at the lower values.

(b) Restricting the scatterplot to only those with less than 100 patients in each group:

library(meta)
data(Olkin1995)
ggplot(Olkin1995, aes(n.exp, n.cont)) + geom_point() + ylim(1, 100) + xlim(1,100)

- As there was some overplotting in this range, “zooming” on that interval helps us gain more insight concerning 3-4 outliers, which were not visible before.

Exercise 3. Zuni

(a) Scatterplot of average revenue per pupil (Revenue) against the corresponding number of pupils (Mem):

library(lawstat)
data(zuni)
#print(zuni)
ggplot(zuni, aes(Revenue,Mem)) + geom_point()

Observations: - there are two outliers for higher values for the Mem variable; -there are 4 outliers for values close to 0 for the Mem variable and bigger values for the Revenue; - there is a certain interval in which the values are situated and it may be a case of overplotting, as there are 420 rows in the dataset and only a few points on the scatterplot.

(b) Plotting against log of the number of pupils preserves the order of the observations while making outliers less extreme. So the log transform enhances the visualization.

library(lawstat)
data(zuni)
ggplot(zuni, aes(Revenue,log(Mem))) + geom_point()

Logging also revenue per pupil adds no other insight to the scatterplot, as shown below:

library(lawstat)
data(zuni)
ggplot(zuni, aes(log(Revenue),log(Mem))) + geom_point()

Exercise 4. Pearson heights

(a) Scatterplot of the heights:

data(father.son, package="UsingR")
ggplot(father.son, aes(fheight, sheight)) + geom_point()

There are some outliers, for both higher and lower values of the variables, but it is hard to determine which ones.

(b) Including both points and highest density regions:

library(hdrcde)
## This is hdrcde 3.3
data(father.son, package="UsingR")

par(mar=c(3.1, 4.1, 1.1, 2.1))
with(father.son,hdr.boxplot.2d(fheight, sheight, show.points=TRUE, prob=c(0.01,0.05,0.5,0.75)))

After using a density estimate, it’s easier to see determine some outliers, outside the contours: 3 for lower values and 2-3 for higher values, all bivariate.

Note: mar= A numeric vector of length 4, which sets the margin sizes in the following order: bottom, left, top, and right. The default is c(5.1, 4.1, 4.1, 2.1).

(c) Fitting a linear model to the data and a loess smooth:

data(father.son, package="UsingR")

ggplot(father.son, aes(fheight, sheight)) + geom_point() + geom_smooth(method="lm", colour="red") + geom_abline(slope=1, intercept=0)
## `geom_smooth()` using formula 'y ~ x'

A nonlinear model is not necessary, as the two curves are almost identical:

data(father.son, package="UsingR")
ggplot(father.son, aes(fheight, sheight)) + geom_point() +
geom_smooth(method="lm", colour="red", se=FALSE) +
stat_smooth()
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Exercise 5. Bank discrimination

A subset of Roberts’ bank sex discrimination dataset from 1979 is available in the package Sleuth2 under the name case1202.

(a) Scatterplot matrix of the three variables: Senior, Age and Exper:

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
data(case1202, package="Sleuth2")
#summary(case1202)
par(mar=c(1.1, 1.1, 1.1, 1.1))
#spm(select(case1202, c(4:5,7)), diagonal="histogram", smoother=FALSE, reg.line=FALSE) #groups=bank$Status)
ggpairs(case1202[,c(4:5, 7)], title="Bank discrimination", diag=list(continuous='density'), axisLabels='none')

(b) Scatterplots involving seniority do not have the structure of the scatterplot of experience against age because:

Exercise 6. Cars

Figure 5.8.

data(Cars93, package="MASS")
#print(Cars93)
ggplot(Cars93, aes(Weight, MPG.city)) + geom_point() +
geom_smooth(colour="green") + ylim(0,50)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Note: fuel economy decreases with weight quite quickly initially and then more slowly.

Plotting 1/MPG.City (litres per 100 km instead of miles per gallon) against Horsepower:

data(Cars93, package="MASS")

ggplot(Cars93, aes((1/MPG.city), Horsepower)) + geom_point() + geom_smooth(colour="green") 
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

data(Cars93, package="MASS")
# filtered <- filter(Cars93, Horsepower <200 & MPG.city >20)
filtered <- filter(Cars93, Horsepower > 100)
#print(filtered)

Exercise 7. Leaves

The leafshape dataset in the DAAG package includes three measurements on each leaf (length, width, petiole) and the logarithms of the three measurements.

(a) Sploms for the two sets of three variables

library(GGally)
library(ggplot2)
data(leafshape, package="DAAG")
#summary(leafshape)
print(leafshape)
##      bladelen     petiole bladewid latitude     logwid        logpet     loglen
## 1   33.880000  1.40263200   13.650      5.0  2.6137395  0.3383504716  3.5228249
## 2   33.320000  1.01626000   10.260      5.0  2.3282528  0.0161292219  3.5061578
## 3   29.350000  2.39202500   12.210      5.0  2.5022553  0.8721402875  3.3792925
## 4   26.870000  0.80878700    8.700      5.0  2.1633230 -0.2122196846  3.2910104
## 5   26.670000  0.80276700    8.410      5.0  2.1294215 -0.2196907690  3.2835393
## 6   24.230000  1.49014500    7.700      5.0  2.0412203  0.3988734307  3.1875915
## 7   23.850000  1.09233000    5.690      5.0  1.7387102  0.0883130295  3.1717842
## 8   23.300000  1.91060000    8.410      5.0  2.1294215  0.6474173289  3.1484534
## 9   23.110000  1.88115400    7.950      5.0  2.0731719  0.6318854183  3.1402654
## 10  23.080000  0.97628400    6.980      5.0  1.9430489 -0.0240017513  3.1389664
## 11  22.140000  3.01546800    8.010      5.0  2.0806908  1.1037550420  3.0973859
## 12  21.570000  2.62291200    8.670      5.0  2.1598688  0.9642851510  3.0713035
## 13  21.300000  1.46118000    6.590      5.0  1.8855533  0.3792443285  3.0587071
## 14  20.970000  1.36095300    6.730      5.0  1.9065751  0.3081851896  3.0430928
## 15  20.820000  0.88485000    6.710      5.0  1.9035990 -0.1223371399  3.0359141
## 16  20.480000  0.97075200    6.060      5.0  1.8017098 -0.0296842501  3.0194488
## 17  20.420000  1.22315800    8.480      5.0  2.1377104  0.2014360389  3.0165148
## 18  20.410000  0.95722900    6.240      5.0  1.8309802 -0.0437126267  3.0160250
## 19  20.370000  0.99609300    5.070      5.0  1.6233408 -0.0039146523  3.0140632
## 20  20.130000  1.34468400    6.730      5.0  1.9065751  0.2961590412  3.0022112
## 21  20.040000  1.17835200    5.010      5.0  1.6114359  0.1641168521  2.9977303
## 22  19.370000  1.58640300    6.620      5.0  1.8900954  0.4614691893  2.9637255
## 23  19.340000  1.08884200    6.570      5.0  1.8825138  0.0851147462  2.9621755
## 24  19.300000  1.32784000    7.180      5.0  1.9712994  0.2835535619  2.9601051
## 25  19.210000  0.85100300    6.890      5.0  1.9300711 -0.1613396252  2.9554310
## 26  19.020000  0.88633200    7.440      5.0  2.0068708 -0.1206636807  2.9454911
## 27  18.560000  0.95027200    6.220      5.0  1.8277699 -0.0510070196  2.9210087
## 28  18.340000  0.67307800    5.600      5.0  1.7227666 -0.3958940571  2.9090845
## 29  18.190000  0.61664100    9.280      5.0  2.2278615 -0.4834682721  2.9008720
## 30  18.170000  0.36885100    8.320      5.0  2.1186623 -0.9973625105  2.8997719
## 31  18.070000  0.71918600    6.180      5.0  1.8213183 -0.3296352621  2.8942531
## 32  17.600000  0.62656000    5.430      5.0  1.6919391 -0.4675107391  2.8678989
## 33  17.510000  1.59165900    5.850      5.0  1.7664417  0.4647768685  2.8627721
## 34  17.260000  0.57303200    4.500      5.0  1.5040774 -0.5568137174  2.8483917
## 35  17.260000  0.91650600    6.300      5.0  1.8405496 -0.0871866651  2.8483917
## 36  17.220000  0.60958800    5.670      5.0  1.7351891 -0.4949719598  2.8460715
## 37  17.020000  0.74547600    5.600      5.0  1.7227666 -0.2937323385  2.8343891
## 38  16.800000  1.32384000    5.930      5.0  1.7800242  0.2805366043  2.8213789
## 39  16.060000  1.23180200    6.070      5.0  1.8033586  0.2084781379  2.7763317
## 40  15.940000  0.54355400    6.600      5.0  1.8870696 -0.6096262213  2.7688317
## 41  15.920000  0.60336800    6.210      5.0  1.8261609 -0.5052279865  2.7675762
## 42  15.580000  0.49856000    4.540      5.0  1.5129270 -0.6960313357  2.7459880
## 43  15.460000  0.75908600    5.510      5.0  1.7065646 -0.2756402010  2.7382560
## 44  15.020000  0.78704800    4.620      5.0  1.5303947 -0.2394660413  2.7093826
## 45  14.920000  0.97129200    4.430      5.0  1.4883996 -0.0291281350  2.7027026
## 46  14.540000  0.65720800    5.010      5.0  1.6114359 -0.4197547200  2.6769035
## 47  14.080000  0.18867200    5.370      5.0  1.6808279 -1.6677452213  2.6447554
## 48  13.750000  0.55275000    5.190      5.0  1.6467337 -0.5928494592  2.6210388
## 49  13.740000  0.98790600    5.940      5.0  1.7817091 -0.0121677275  2.6203113
## 50  13.590000  0.35469900    5.620      5.0  1.7263317 -1.0364857365  2.6093342
## 51  13.270000  0.75904400    5.130      5.0  1.6351057 -0.2756955323  2.5855058
## 52  13.070000  0.71362200    5.520      5.0  1.7083779 -0.3374018686  2.5703195
## 53  13.020000  0.57678600    4.850      5.0  1.5789787 -0.5502839652  2.5664866
## 54  13.020000  0.17186400    4.700      5.0  1.5475625 -1.7610518126  2.5664866
## 55  12.990000  1.98227400    5.260      5.0  1.6601310  0.6842446706  2.5641798
## 56  12.870000  0.71943300    4.970      5.0  1.6034198 -0.3292918772  2.5548990
## 57  12.650000  0.66286000    5.180      5.0  1.6448051 -0.4111914725  2.5376572
## 58  12.280000  0.38190800    5.510      5.0  1.7065646 -0.9625755371  2.5079719
## 59  12.030000  0.47999700    2.330      5.0  0.8458683 -0.7339754251  2.4874035
## 60  11.000000  0.42350000    3.790      5.0  1.3323660 -0.8592017649  2.3978953
## 61  10.400000  0.85904000    3.520      5.0  1.2584610 -0.1519397923  2.3418058
## 62  10.020000  0.35070000    3.230      5.0  1.1724821 -1.0478241218  2.3045831
## 63   9.810000  0.36000000    2.820      5.0  1.0367369 -1.0216512475  2.2834023
## 64   8.420000  0.22734000    2.530      5.0  0.9282193 -1.4813085847  2.1306098
## 65   2.280000  0.11263200    1.250      5.0  0.2231436 -2.1836294118  0.8241754
## 66  43.400000  3.37652000   10.940      5.0  2.3924258  1.2168455933  3.7704594
## 67  41.390000  5.54212100   17.750      5.0  2.8763855  1.7123772795  3.7230393
## 68  33.350000  3.23161500   12.270      5.0  2.5071573  1.1729820123  3.5070578
## 69  29.340000  3.22740000   10.050      5.0  2.3075726  1.1716768595  3.3789518
## 70  29.250000  4.43137500   10.420      5.0  2.3437270  1.4887099196  3.3758796
## 71  28.480000  4.93000000    7.420      5.0  2.0041791  1.5953389881  3.3492021
## 72  27.380000 22.16000000   26.410      5.0  3.2737427  3.0982888619  3.3098128
## 73  26.600000  1.25552000    7.250      5.0  1.9810015  0.2275498294  3.2809112
## 74  24.880000  3.30157600    7.750      5.0  2.0476928  1.1943999302  3.2140643
## 75  24.800000  3.86700000    9.150      5.0  2.2137539  1.3524790126  3.2108437
## 76  23.750000  2.50325000    7.140      5.0  1.9657128  0.9175898876  3.1675825
## 77  22.050000  4.58860500    8.610      5.0  2.1529243  1.5235760563  3.0933126
## 78  20.360000  2.69159200    6.080      5.0  1.8050047  0.9901328401  3.0135722
## 79  19.880000  3.21062000    6.080      5.0  1.8050047  1.1664640649  2.9897142
## 80  19.350000  3.04375500    6.960      5.0  1.9401795  1.1130919506  2.9626924
## 81   6.740000  0.21972400    2.480      9.1  0.9082586 -1.5153830657  1.9080599
## 82   7.700000  0.38962000    3.140      9.1  1.1442228 -0.9425833738  2.0412203
## 83   7.970000  0.28692000    2.630      9.1  0.9669838 -1.2485518477  2.0756845
## 84   8.820000  0.38014200    3.780      9.1  1.3297240 -0.9672104119  2.1770219
## 85   9.440000  0.43518400    3.950      9.1  1.3737156 -0.8319863488  2.2449560
## 86   9.920000  0.29264000    4.110      9.1  1.4134230 -1.2288120943  2.2945529
## 87  11.230000  0.52219500    4.110      9.1  1.4134230 -0.6497141976  2.4185888
## 88  11.260000  0.50670000    5.220      9.1  1.6524974 -0.6798361665  2.4212566
## 89  11.290000  0.29128200    4.170      9.1  1.4279160 -1.2334634089  2.4239174
## 90  12.120000  0.56600400    4.030      9.1  1.3937664 -0.5691541337  2.4948570
## 91  12.260000  0.60564400    5.450      9.1  1.6956156 -0.5014629243  2.5063419
## 92  12.330000  0.49073400    3.720      9.1  1.3137237 -0.7118530495  2.5120353
## 93  12.920000  0.57623200    3.810      9.1  1.3376292 -0.5512449216  2.5587765
## 94  13.000000  0.31070000    3.650      9.1  1.2947272 -1.1689274626  2.5649494
## 95  13.560000  0.97360800    4.150      9.1  1.4231083 -0.0267465204  2.6071243
## 96  14.570000  0.88439900    4.880      9.1  1.5851452 -0.1228469607  2.6789646
## 97  14.580000  1.25679600    5.850      9.1  1.7664417  0.2285656253  2.6796507
## 98  14.600000  0.87016000    6.180      9.1  1.8213183 -0.1390781762  2.6810215
## 99  15.120000  0.51105600    6.420      9.1  1.8594181 -0.6712761057  2.7160184
## 100 15.200000  0.42712000    6.130      9.1  1.8131947 -0.8506902748  2.7212954
## 101 15.670000  1.09063200    6.250      9.1  1.8325815  0.0867573447  2.7517481
## 102 16.180000  0.60836800    4.050      9.1  1.3987169 -0.4969753170  2.7837759
## 103 16.620000  0.30082200    5.800      9.1  1.7578579 -1.2012365513  2.8106068
## 104 16.710000  1.56739800    4.370      9.1  1.4747630  0.4494169196  2.8160073
## 105 17.100000  0.53865000    4.850      9.1  1.5789787 -0.6186892696  2.8390785
## 106 17.630000  1.65016800    4.800      9.1  1.5686159  0.5008771009  2.8696020
## 107 17.690000  1.07555200    6.280      9.1  1.8373700  0.0728340182  2.8729995
## 108 18.700000  1.04159000    7.130      9.1  1.9643112  0.0407483918  2.9285235
## 109 19.100000  2.27672000    8.320      9.1  2.1186623  0.8227358107  2.9496883
## 110 19.290000  1.32329400    7.080      9.1  1.9572739  0.2801240827  2.9595868
## 111 19.990000  1.74712600    6.650      9.1  1.8946169  0.5579721522  2.9952321
## 112 20.250000  1.22107500    7.270      9.1  1.9837563  0.1997316183  3.0081548
## 113 20.870000  0.88488800    8.210      9.1  2.1053529 -0.1222941957  3.0383127
## 114 21.160000  0.50995600    7.130      9.1  1.9643112 -0.6734308315  3.0521126
## 115 26.990000  1.34950000   10.340      9.1  2.3360199  0.2997341535  3.2954664
## 116 28.940000  1.27625400    8.430      9.1  2.1317968  0.2439292247  3.3652247
## 117 45.230000  3.52341700   16.670      9.1  2.8136107  1.2594312574  3.8117606
## 118 45.890000 51.08000000   43.580      9.1  3.7745983  3.9333930312  3.8262472
## 119 20.920000 12.92437600   15.090      9.1  2.7140323  2.5591151407  3.0407056
## 120 22.870000 11.14683800   14.810      9.1  2.6953026  2.4111558702  3.1298260
## 121 25.940000 11.40581800   16.560      9.1  2.8069901  2.4341235761  3.2557862
## 122 14.240000  5.70739200    7.220      9.1  1.9768550  1.7417621768  2.6560549
## 123 16.250000  4.25425000    5.640      9.1  1.7298841  1.4479184833  2.7880929
## 124 49.300000  9.71703000   26.250      9.1  3.2676660  2.2738800162  3.8979241
## 125 24.180000  2.46394200   10.800      9.1  2.3795461  0.9017625064  3.1855258
## 126 14.580000  1.43321400    4.500      9.1  1.5040774  0.3599194748  2.6796507
## 127 49.450000  4.83126500   11.540      9.1  2.4458193  1.5751083381  3.9009621
## 128 13.800000  1.31652000    6.580      9.1  1.8840347  0.2749918916  2.6246686
## 129 14.030000  1.22481900    5.400      9.1  1.6863990  0.2027930780  2.6411979
## 130 25.800000  1.91952000    7.120      9.1  1.9629077  0.6520751548  3.2503745
## 131 14.790000  0.93177000    3.370      9.1  1.2149127 -0.0706692759  2.6939513
## 132 28.940000  1.72193000    8.840      9.1  2.1792869  0.5434457548  3.3652247
## 133 18.350000  0.78905000    6.990      9.1  1.9444806 -0.2369255888  2.9096296
## 134 11.670000  0.39444600    4.670      9.1  1.5411591 -0.9302730302  2.4570214
## 135 13.110000  0.29104200    4.680      9.1  1.5432981 -1.2342876923  2.5733753
## 136  7.770000  0.47785500    3.310     10.4  1.1969482 -0.7384479398  2.0502702
## 137 10.310000  0.28249400    2.620     10.4  0.9631743 -1.2640979676  2.3331143
## 138 11.280000  1.04678400    4.700     10.4  1.5475625  0.0457226069  2.4230312
## 139 11.670000  0.53215200    2.820     10.4  1.0367369 -0.6308261162  2.4570214
## 140 13.380000  0.58738200    3.750     10.4  1.3217558 -0.5320799042  2.5937611
## 141 14.100000  0.37647000    4.920     10.4  1.5933085 -0.9769169162  2.6461748
## 142 14.100000  0.56118000    5.590     10.4  1.7209793 -0.5777135693  2.6461748
## 143 14.180000  0.72743400    5.300     10.4  1.6677068 -0.3182320057  2.6518325
## 144 15.010000  1.34189400    5.810     10.4  1.7595806  0.2940820488  2.7087166
## 145 16.100000  0.85652000    5.770     10.4  1.7526721 -0.1548776106  2.7788193
## 146 16.560000  0.62265600    4.780     10.4  1.5644405 -0.4737610796  2.8069901
## 147 16.590000  0.64203300    6.290     10.4  1.8389611 -0.4431155747  2.8088001
## 148 16.860000  1.71297600    4.620     10.4  1.5303947  0.5382322087  2.8249440
## 149 17.010000  1.55811600    7.970     10.4  2.0756845  0.4434773991  2.8338014
## 150 17.860000  0.51079600    6.180     10.4  1.8213183 -0.6717849857  2.8825636
## 151 18.010000  1.62090000    8.050     10.4  2.0856721  0.4829815505  2.8909272
## 152 18.100000  1.08962000    6.570     10.4  1.8825138  0.0858290116  2.8959119
## 153 18.710000  0.97479100    5.580     10.4  1.7191888 -0.0255321899  2.9290581
## 154 18.750000  0.75562500    5.960     10.4  1.7850705 -0.2802100576  2.9311938
## 155 19.740000  2.49711000    7.330     10.4  1.9919755  0.9151340632  2.9826470
## 156 20.100000  0.74772000    6.310     10.4  1.8421357 -0.2907267026  3.0007198
## 157 20.160000  1.06848000    8.140     10.4  2.0967902  0.0662370778  3.0037004
## 158 20.320000  2.28396800   10.010     10.4  2.3035846  0.8259142812  3.0116056
## 159 20.350000  2.79202000   10.290     10.4  2.3311725  1.0267653482  3.0130809
## 160 20.420000  1.06184000    4.790     10.4  1.5665304  0.0600032523  3.0165148
## 161 22.670000  1.27405400    7.490     10.4  2.0135688  0.2422039424  3.1210425
## 162 22.900000  1.33049000    9.000     10.4  2.1972246  0.2855472954  3.1311369
## 163 23.800000  0.70210000    8.370     10.4  2.1246539 -0.3536794350  3.1696856
## 164 23.870000  1.28181900    9.340     10.4  2.2343063  0.2482801629  3.1726224
## 165 24.700000  2.01058000   10.590     10.4  2.3599102  0.6984232377  3.2068032
## 166 26.400000  0.90816000    8.560     10.4  2.1471002 -0.0963347045  3.2733640
## 167 28.120000  1.16698000    7.310     10.4  1.9892433  0.1544192152  3.3364811
## 168 30.060000  1.56913200    9.020     10.4  2.1994443  0.4505226002  3.4031954
## 169 28.370000  6.76908200   15.840     10.4  2.7625384  1.9123654795  3.3453322
## 170 17.330000  3.39841300    7.200     10.4  1.9740810  1.2233085579  2.8524391
## 171 17.680000  1.46213600    5.420     10.4  1.6900958  0.3798983803  2.8724341
## 172 18.000000  1.00800000    4.120     10.4  1.4158532  0.0079681696  2.8903718
## 173 19.210000 12.23484900    9.340     10.4  2.2343063  2.5042883552  2.9554310
## 174 20.180000  1.95140600   10.130     10.4  2.3155013  0.6685501384  3.0046920
## 175 21.020000  2.82508800    7.240     10.4  1.9796212  1.0385395146  3.0454744
## 176 22.440000  8.59003200   13.090     10.4  2.5718486  2.1506024613  3.1108451
## 177 26.870000 12.65577000   13.590     10.4  2.6093342  2.5381132377  3.2910104
## 178 30.800000  2.40240000   10.140     10.4  2.3164880  0.8764682377  3.4275147
## 179 31.950000 24.93000000   36.620     10.4  3.6005945  3.2160718975  3.4641722
## 180 32.840000  5.45144000   11.000     10.4  2.3978953  1.6958797940  3.4916473
## 181 33.290000  2.86294000   10.540     10.4  2.3551775  1.0518490689  3.5052571
## 182 38.220000  2.63335800   15.720     10.4  2.7549338  0.9682598378  3.6433589
## 183 46.600000 48.88000000   46.170     10.4  3.8323302  3.8893683149  3.8416005
## 184 46.940000  3.37498600   15.800     10.4  2.7600099  1.2163911762  3.8488702
## 185 72.300000  2.39313000   20.600     10.4  3.0252911  0.8726021326  4.2808241
## 186  6.930000  0.44906400    3.100     17.1  1.1314021 -0.8005898624  1.9358598
## 187  7.270000  0.26172000    2.500     17.1  0.9162907 -1.3404800490  1.9837563
## 188  7.430000  0.76603300    2.980     17.1  1.0919233 -0.2665300292  2.0055259
## 189  8.000000  0.59280000    3.060     17.1  1.1184149 -0.5228982050  2.0794415
## 190  8.080000  0.49290000    3.030     17.1  1.1085626 -0.7074489653  2.0893919
## 191  8.480000  0.81916800    4.330     17.1  1.4655675 -0.1994660880  2.1377104
## 192  8.560000  0.59834400    4.820     17.1  1.5727739 -0.5135894396  2.1471002
## 193  8.720000  0.59010000    4.880     17.1  1.5851452 -0.5274632649  2.1656192
## 194  9.153333  0.52082465    3.010     17.1  1.1019401 -0.6523418620  2.2141181
## 195  9.353333  0.89324330    4.040     17.1  1.3962447 -0.1128962806  2.2357328
## 196  9.480000  0.23984400    3.240     17.1  1.1755733 -1.4277665670  2.2491843
## 197  9.560000  0.65964000    2.840     17.1  1.0438041 -0.4160610473  2.2575877
## 198 10.060000  1.25951200    3.370     17.1  1.2149127  0.2307243444  2.3085672
## 199 10.060000  0.60259400    3.790     17.1  1.3323660 -0.5065116092  2.3085672
## 200 10.110000  0.74460150    3.720     17.1  1.3137237 -0.2949061030  2.3135250
## 201 10.370000  0.28100000    3.480     17.1  1.2470323 -1.2694006096  2.3389170
## 202 10.440000  0.48963600    3.990     17.1  1.3837912 -0.7140930211  2.3456446
## 203 10.500000  0.09975000    3.330     17.1  1.2029723 -2.3050882232  2.3513753
## 204 10.550000  1.00014000    5.160     17.1  1.6409366  0.0001399902  2.3561259
## 205 10.870000  0.84786000    4.010     17.1  1.3887912 -0.1650397512  2.3860067
## 206 11.220000  1.10404800    4.650     17.1  1.5368672  0.0989834252  2.4176979
## 207 11.660000  0.87100000    3.620     17.1  1.2864740 -0.1381133021  2.4561642
## 208 11.860000  0.85629200    4.330     17.1  1.4655675 -0.1551438395  2.4731714
## 209 12.230000  0.38646800    4.110     17.1  1.4134230 -0.9507062087  2.5038919
## 210 12.840000  0.88981200    4.360     17.1  1.4724721 -0.1167450745  2.5525653
## 211 12.900000  0.27090000    4.180     17.1  1.4303112 -1.3060055299  2.5572273
## 212 12.960000  0.41536800    4.030     17.1  1.3937664 -0.8785904047  2.5618677
## 213 13.030000  0.28763725    3.290     17.1  1.1908876 -1.2460551414  2.5672544
## 214 13.400000  1.47802000    5.590     17.1  1.7209793  0.3907033542  2.5952547
## 215 13.413333  0.85487195    4.340     17.1  1.4678743 -0.1568035850  2.5962492
## 216 13.590000  0.96964650    5.640     17.1  1.7298841 -0.0308237069  2.6093342
## 217 14.360000  0.54855200    4.260     17.1  1.4492692 -0.6004731997  2.6644466
## 218 14.400000  0.45072000    3.975     17.1  1.3800247 -0.7969089749  2.6672282
## 219 14.600000  1.04828000    3.050     17.1  1.1151416  0.0471507258  2.6810215
## 220 14.900000  0.93125000    4.570     17.1  1.5195132 -0.0712275093  2.7013612
## 221 16.490000  1.20541900    7.610     17.1  2.0294632  0.1868272243  2.8027541
## 222 16.670000  0.69847300    5.680     17.1  1.7369512 -0.3588587553  2.8136107
## 223 17.040000  1.15957200    6.860     17.1  1.9257074  0.1480509715  2.8355635
## 224 19.890000  1.17950000    7.810     17.1  2.0554050  0.1650906199  2.9902171
## 225 24.430000  1.20440000    6.960     17.1  1.9401795  0.1859815176  3.1958119
## 226 30.350000  1.65862750    6.780     17.1  1.9139771  0.5059904531  3.4127965
## 227 13.980000  1.50984000    3.900     17.1  1.3609766  0.4120036849  2.6376277
## 228 14.450000  2.88422000    5.720     17.1  1.7439688  1.0592544995  2.6706944
## 229 15.250000  0.84027500    3.000     17.1  1.0986123 -0.1740260598  2.7245795
## 230 15.300000  1.87425000    5.880     17.1  1.7715568  0.6282085794  2.7278528
## 231 15.348000  1.80492480    4.840     17.1  1.5769147  0.5905189289  2.7309852
## 232 15.460000  1.84283200    5.850     17.1  1.7664417  0.6113035188  2.7382560
## 233 15.560000  1.03940800    4.720     17.1  1.5518088  0.0386513203  2.7447035
## 234 17.000000  0.41480000    7.220     17.1  1.9768550 -0.8799588026  2.8332133
## 235 17.800000  1.55572000    5.280     17.1  1.6639261  0.4419384610  2.8791985
## 236 17.970000  3.52122150    6.150     17.1  1.8164521  1.2588079465  2.8887037
## 237 18.760000  1.87600000    7.190     17.1  1.9726912  0.6291418506  2.9317269
## 238 19.680000  6.59280000    8.160     17.1  2.0992442  1.8859781445  2.9796029
## 239 20.980000  3.26868400    6.580     17.1  1.8840347  1.1843874574  3.0435696
## 240 22.215000 10.47437250   11.460     17.1  2.4388627  2.3489315595  3.1007677
## 241 22.640000  0.87503600    3.800     17.1  1.3350011 -0.1334902506  3.1197183
## 242 25.275000  3.63454500    6.850     17.1  1.9242487  1.2904839312  3.2298158
## 243 26.550000  2.63508750    5.920     17.1  1.7783364  0.9689163883  3.2790297
## 244 26.992500  3.83563425    8.530     17.1  2.1435894  1.3443348058  3.2955590
## 245 35.843333  3.79939330    9.510     17.1  2.2523439  1.3348413956  3.5791576
## 246 42.275000  1.49442125   23.450     17.1  3.1548705  0.4017390081  3.7441959
## 247  0.855500  0.06916717    0.545     28.2 -0.6069695 -2.6712288786 -0.1560692
## 248  2.080000  0.12937600    0.793     28.2 -0.2319321 -2.0450323855  0.7323679
## 249  6.220000  0.43229000    3.200     28.2  1.1631508 -0.8386586197  1.8277699
## 250  6.900000  0.24978000    2.860     28.2  1.0508216 -1.3871747485  1.9315214
## 251  7.160000  0.34010000    3.260     28.2  1.1817272 -1.0785155870  1.9685100
## 252  8.030000  0.66247500    4.020     28.2  1.3912819 -0.4117724577  2.0831845
## 253  8.160000  0.53692800    3.330     28.2  1.2029723 -0.6218912717  2.0992442
## 254  8.410000  0.27837100    2.660     28.2  0.9783261 -1.2788005226  2.1294215
## 255  9.660000  0.68199600    3.440     28.2  1.2354715 -0.3827314863  2.2679936
## 256 11.075000  1.03900000    4.480     28.2  1.4996230  0.0382587121  2.4046903
## 257 11.390000  0.75060100    4.690     28.2  1.5454326 -0.2868810600  2.4327358
## 258 11.460000  0.59935800    3.420     28.2  1.2296406 -0.5118961966  2.4388627
## 259 11.920000  0.72354400    5.080     28.2  1.6253113 -0.3235939193  2.4782177
## 260  4.240000  0.31588000    1.540     28.2  0.4317824 -1.1523928844  1.4445633
## 261  7.220000  0.88950400    2.930     28.2  1.0750024 -0.1170912750  1.9768550
## 262  8.230000  0.28475800    3.050     28.2  1.1151416 -1.2561155822  2.1077860
## 263  9.540000  0.99693000    4.000     28.2  1.3862944 -0.0030747221  2.2554935
## 264  9.700000  0.77503000    2.930     28.2  1.0750024 -0.2548535407  2.2721259
## 265 10.010000  0.90690600    3.310     28.2  1.1969482 -0.0977164726  2.3035846
## 266 10.800000  0.95094000    3.570     28.2  1.2725656 -0.0503043099  2.3795461
## 267 11.460000  3.05065200    5.040     28.2  1.6174061  1.1153553383  2.4388627
## 268 11.870000  5.78899900    6.680     28.2  1.8991180  1.7559593924  2.4740142
## 269 11.940000  0.52774800    3.050     28.2  1.1151416 -0.6391363819  2.4798941
## 270 12.380000  0.66397654    3.360     28.2  1.2119410 -0.4095084615  2.5160823
## 271 12.600000  0.72702000    4.040     28.2  1.3962447 -0.3188012915  2.5336968
## 272 12.680000  0.90471800    3.280     28.2  1.1878434 -0.1001319861  2.5400259
## 273 13.840000  1.20131200    4.300     28.2  1.4586150  0.1834142929  2.6275630
## 274 16.920000  0.65818800    3.470     28.2  1.2441546 -0.4182646742  2.8284964
## 275 21.900000 21.35000000   23.650     28.2  3.1633631  3.0610517397  3.0864866
## 276 29.500000  0.85550000   20.000     28.2  2.9957323 -0.1560691856  3.3843903
## 277 36.520000 30.18000000   30.570     28.2  3.4200191  3.4071794533  3.5978601
## 278  1.149429  0.15858787    0.890     42.0 -0.1165338 -1.8414464607  0.1392653
## 279  6.033333  0.48165907    2.750     42.0  1.0116009 -0.7305187326  1.7972996
## 280  3.794286  0.48886719    1.510     42.0  0.4121097 -0.7156644194  1.3334963
## 281  4.950500  0.60445605    2.130     42.0  0.7561220 -0.5034263163  1.5994886
## 282 16.646667  1.40719270    3.730     42.0  1.3164082  0.3415967282  2.8122100
## 283  4.128000  0.39653568    1.160     42.0  0.1484200 -0.9249892546  1.4177930
## 284 12.900000  1.15197000    2.550     42.0  0.9360934  0.1414735203  2.5572273
## 285  1.121333  0.13426056    0.415     42.0 -0.8794768 -2.0079728597  0.1145182
## 286  1.475000  0.09292500    0.145     42.0 -1.9310215 -2.3759625628  0.3886580
##     arch     location
## 1      0        Sabah
## 2      0        Sabah
## 3      0        Sabah
## 4      0        Sabah
## 5      0        Sabah
## 6      0        Sabah
## 7      0        Sabah
## 8      0        Sabah
## 9      0        Sabah
## 10     0        Sabah
## 11     0        Sabah
## 12     0        Sabah
## 13     0        Sabah
## 14     0        Sabah
## 15     0        Sabah
## 16     0        Sabah
## 17     0        Sabah
## 18     0        Sabah
## 19     0        Sabah
## 20     0        Sabah
## 21     0        Sabah
## 22     0        Sabah
## 23     0        Sabah
## 24     0        Sabah
## 25     0        Sabah
## 26     0        Sabah
## 27     0        Sabah
## 28     0        Sabah
## 29     0        Sabah
## 30     0        Sabah
## 31     0        Sabah
## 32     0        Sabah
## 33     0        Sabah
## 34     0        Sabah
## 35     0        Sabah
## 36     0        Sabah
## 37     0        Sabah
## 38     0        Sabah
## 39     0        Sabah
## 40     0        Sabah
## 41     0        Sabah
## 42     0        Sabah
## 43     0        Sabah
## 44     0        Sabah
## 45     0        Sabah
## 46     0        Sabah
## 47     0        Sabah
## 48     0        Sabah
## 49     0        Sabah
## 50     0        Sabah
## 51     0        Sabah
## 52     0        Sabah
## 53     0        Sabah
## 54     0        Sabah
## 55     0        Sabah
## 56     0        Sabah
## 57     0        Sabah
## 58     0        Sabah
## 59     0        Sabah
## 60     0        Sabah
## 61     0        Sabah
## 62     0        Sabah
## 63     0        Sabah
## 64     0        Sabah
## 65     0        Sabah
## 66     1        Sabah
## 67     1        Sabah
## 68     1        Sabah
## 69     1        Sabah
## 70     1        Sabah
## 71     1        Sabah
## 72     1        Sabah
## 73     1        Sabah
## 74     1        Sabah
## 75     1        Sabah
## 76     1        Sabah
## 77     1        Sabah
## 78     1        Sabah
## 79     1        Sabah
## 80     1        Sabah
## 81     0       Panama
## 82     0       Panama
## 83     0       Panama
## 84     0       Panama
## 85     0       Panama
## 86     0       Panama
## 87     0       Panama
## 88     0       Panama
## 89     0       Panama
## 90     0       Panama
## 91     0       Panama
## 92     0       Panama
## 93     0       Panama
## 94     0       Panama
## 95     0       Panama
## 96     0       Panama
## 97     0       Panama
## 98     0       Panama
## 99     0       Panama
## 100    0       Panama
## 101    0       Panama
## 102    0       Panama
## 103    0       Panama
## 104    0       Panama
## 105    0       Panama
## 106    0       Panama
## 107    0       Panama
## 108    0       Panama
## 109    0       Panama
## 110    0       Panama
## 111    0       Panama
## 112    0       Panama
## 113    0       Panama
## 114    0       Panama
## 115    0       Panama
## 116    0       Panama
## 117    0       Panama
## 118    1       Panama
## 119    1       Panama
## 120    1       Panama
## 121    1       Panama
## 122    1       Panama
## 123    1       Panama
## 124    1       Panama
## 125    1       Panama
## 126    1       Panama
## 127    1       Panama
## 128    1       Panama
## 129    1       Panama
## 130    1       Panama
## 131    1       Panama
## 132    1       Panama
## 133    1       Panama
## 134    1       Panama
## 135    1       Panama
## 136    0   Costa Rica
## 137    0   Costa Rica
## 138    0   Costa Rica
## 139    0   Costa Rica
## 140    0   Costa Rica
## 141    0   Costa Rica
## 142    0   Costa Rica
## 143    0   Costa Rica
## 144    0   Costa Rica
## 145    0   Costa Rica
## 146    0   Costa Rica
## 147    0   Costa Rica
## 148    0   Costa Rica
## 149    0   Costa Rica
## 150    0   Costa Rica
## 151    0   Costa Rica
## 152    0   Costa Rica
## 153    0   Costa Rica
## 154    0   Costa Rica
## 155    0   Costa Rica
## 156    0   Costa Rica
## 157    0   Costa Rica
## 158    0   Costa Rica
## 159    0   Costa Rica
## 160    0   Costa Rica
## 161    0   Costa Rica
## 162    0   Costa Rica
## 163    0   Costa Rica
## 164    0   Costa Rica
## 165    0   Costa Rica
## 166    0   Costa Rica
## 167    0   Costa Rica
## 168    0   Costa Rica
## 169    0   Costa Rica
## 170    1   Costa Rica
## 171    1   Costa Rica
## 172    1   Costa Rica
## 173    1   Costa Rica
## 174    1   Costa Rica
## 175    1   Costa Rica
## 176    1   Costa Rica
## 177    1   Costa Rica
## 178    1   Costa Rica
## 179    1   Costa Rica
## 180    1   Costa Rica
## 181    1   Costa Rica
## 182    1   Costa Rica
## 183    1   Costa Rica
## 184    1   Costa Rica
## 185    1   Costa Rica
## 186    0 N Queensland
## 187    0 N Queensland
## 188    0 N Queensland
## 189    0 N Queensland
## 190    0 N Queensland
## 191    0 N Queensland
## 192    0 N Queensland
## 193    0 N Queensland
## 194    0 N Queensland
## 195    0 N Queensland
## 196    0 N Queensland
## 197    0 N Queensland
## 198    0 N Queensland
## 199    0 N Queensland
## 200    0 N Queensland
## 201    0 N Queensland
## 202    0 N Queensland
## 203    0 N Queensland
## 204    0 N Queensland
## 205    0 N Queensland
## 206    0 N Queensland
## 207    0 N Queensland
## 208    0 N Queensland
## 209    0 N Queensland
## 210    0 N Queensland
## 211    0 N Queensland
## 212    0 N Queensland
## 213    0 N Queensland
## 214    0 N Queensland
## 215    0 N Queensland
## 216    0 N Queensland
## 217    0 N Queensland
## 218    0 N Queensland
## 219    0 N Queensland
## 220    0 N Queensland
## 221    0 N Queensland
## 222    0 N Queensland
## 223    0 N Queensland
## 224    0 N Queensland
## 225    0 N Queensland
## 226    0 N Queensland
## 227    1 N Queensland
## 228    1 N Queensland
## 229    1 N Queensland
## 230    1 N Queensland
## 231    1 N Queensland
## 232    1 N Queensland
## 233    1 N Queensland
## 234    1 N Queensland
## 235    1 N Queensland
## 236    1 N Queensland
## 237    1 N Queensland
## 238    1 N Queensland
## 239    1 N Queensland
## 240    1 N Queensland
## 241    1 N Queensland
## 242    1 N Queensland
## 243    1 N Queensland
## 244    1 N Queensland
## 245    1 N Queensland
## 246    1 N Queensland
## 247    0 S Queensland
## 248    0 S Queensland
## 249    0 S Queensland
## 250    0 S Queensland
## 251    0 S Queensland
## 252    0 S Queensland
## 253    0 S Queensland
## 254    0 S Queensland
## 255    0 S Queensland
## 256    0 S Queensland
## 257    0 S Queensland
## 258    0 S Queensland
## 259    0 S Queensland
## 260    1 S Queensland
## 261    1 S Queensland
## 262    1 S Queensland
## 263    1 S Queensland
## 264    1 S Queensland
## 265    1 S Queensland
## 266    1 S Queensland
## 267    1 S Queensland
## 268    1 S Queensland
## 269    1 S Queensland
## 270    1 S Queensland
## 271    1 S Queensland
## 272    1 S Queensland
## 273    1 S Queensland
## 274    1 S Queensland
## 275    1 S Queensland
## 276    1 S Queensland
## 277    1 S Queensland
## 278    0     Tasmania
## 279    0     Tasmania
## 280    1     Tasmania
## 281    1     Tasmania
## 282    1     Tasmania
## 283    1     Tasmania
## 284    1     Tasmania
## 285    1     Tasmania
## 286    1     Tasmania
par(mar=c(1.1, 1.1, 1.1, 1.1))

ggpairs(leafshape[,c(1:3)], title="Standard Leaf measurements", diag=list(continuous='density'), axisLabels='none')

#leafshape$arch <- unlist(leafshape$arch)

ggpairs(leafshape[,c(7:5)], title="Logaritmic Leaf measurements", diag=list(continuous='density'), axisLabels='none')

#, mapping = ggplot2::aes(colour=leafshape[8]), lower = list(continuous = wrap("smooth", alpha = 0.3, size=0.1)))

(b) Coloring the cases by the variable arch, describing the leaf architecture:

library(car)
## Loading required package: carData
## Registered S3 methods overwritten by 'car':
##   method                          from
##   influence.merMod                lme4
##   cooks.distance.influence.merMod lme4
##   dfbeta.influence.merMod         lme4
##   dfbetas.influence.merMod        lme4
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:lawstat':
## 
##     levene.test
data(leafshape, package="DAAG")
#data(bank, package = "gclus")
par(mar = c(1.1, 1.1, 1.1, 1.1))
 spm( leafshape[c(1:3)],pch = c(16, 16), diagonal = "histogram",smoother = FALSE, reg.line = FALSE,groups = leafshape$arch )

# dplyr::select(leafshape, c(1:3))
library(car)
data(leafshape, package="DAAG")
#data(bank, package = "gclus")
par(mar = c(1.1, 1.1, 1.1, 1.1))
 spm( leafshape[c(7:5)],pch = c(16, 16), diagonal = "histogram",smoother = FALSE, reg.line = FALSE,groups = leafshape$arch )

# dplyr::select(leafshape, c(1:3))

By doing this, we can now observe two clusters formed, based on the arch variable, for each scatterplot in the matrix.

Exercise 8. Olive oils from Italy

(a) Scatterplot matrix of the eight continuous variables representing fatty acids:

library(GGally)
library(ggplot2)
data(olive, package="zenplots")
#summary(olive)

ggpairs(olive[,c(3:10)], title="Olive acids", diag=list(continuous='density'), axisLabels='none')

#pairs(olive)

(b) There might be some outliers depending on different variables:

Other obsevations:

Exercise 9. Boston housing

(a) Splom of all continous variables, excepting chas:

library(GGally)
library(ggplot2)
data(Boston, package="MASS")
#print(Boston)
ggpairs(Boston[,-c(4)], title="Boston housing", diag=list(continuous='density'), axisLabels='none')

Variables that are positively associated with medv are: rm.

(b) Several scatterplots involving the variable crim have an unusual form, because some of the other variables have constant values in most cases. For example the ptratio has values only in a determined range.

(c)

3. Investigating Multivariate Continuous Data

Exercise 1. Swiss

Solution

library('GGally')

# (a)
ggparcoord(data = swiss, columns=c(1:6), scale="uniminmax", alphaLines=0.2) + 
  xlab("") + ylab("")

(b) There might be some outliers depending on different variables:

(c) It looks like the variable Catholic has 2 modes, one at the lowest range (0 - 0.25) and one at the highest ends (0.8 - 1.0). So it is the case in Switzerland that usually a province will either have a majority of catholic or of non-catholic people.

# (d)
swiss1 <- within(swiss, 
                 catholics_level <- factor(ifelse(Catholic > 80, 'High', 'Low')))

ggparcoord(data = swiss1[order(swiss1$Catholic),], columns=c(1:6), scale="uniminmax", 
           groupColumn="catholics_level", alphaLines=0.5) + 
  xlab("") + ylab("") + 
  theme(legend.position = "none")

(d) The provinces with high level of Catholic look like they have a higher index of Fertility, a lower level of Examination (i.e. % draftees receiving highest mark on army examination) and Education. The Infant.Mortality variable looks like it is not affected that much by whether the province has a majority of catholic or non-catholic people.

Exercise 2. Pottery

Solution

# (a)
library('HSAUR2');
## Loading required package: tools
ggparcoord(data = pottery, columns=c(1:9), scale="uniminmax", alphaLines=0.2) + 
  xlab("") + ylab("")

(a) In this pcp we could see different details:

# (b)
# print MgO column as sorted, and select the right threshold (which is 1.0)
sort(pottery$MgO)
##  [1] 0.53 0.56 0.60 0.63 0.67 0.67 0.67 0.67 0.68 0.72 1.50 1.56 1.62 1.62 1.63
## [16] 1.65 1.67 1.81 1.82 1.83 1.83 1.86 1.92 1.94 1.94 1.99 2.00 2.05 2.06 2.06
## [31] 2.33 3.43 3.47 3.77 3.88 3.94 4.26 4.30 4.52 5.34 5.51 5.64 5.69 5.91 7.23
pottery1 <- within(pottery, 
                   mgo_level <- factor(ifelse(MgO < 1, 'Low', 'High')))  # use the 1.0 threshold here

ggparcoord(data = pottery1[order(pottery1$MgO),], columns=c(1:9), scale="uniminmax", 
           groupColumn="mgo_level", alphaLines=0.5) + 
  xlab("") + ylab("") + 
  theme(legend.position = "none")

(b) On the other variables, the cases with low MgO also have lower Fe2O3, CaO, Na2O, K2O, MnO than the other cases. Also, some of these cases have higher values for Al2O3 and TiO2.

# (c)
ggparcoord(data = pottery, columns=c(1:9), scale="uniminmax", 
           groupColumn="kiln", alphaLines=0.5) + 
  xlab("") + ylab("") + geom_line(size=0.7)

(c) In this pcp we can see some differences between different kilns as follows:

Exercise 3. Olive oils

Solution

# (a)
library('pdfCluster')
## pdfCluster 1.0-3
## 
## Attaching package: 'pdfCluster'
## The following object is masked from 'package:dplyr':
## 
##     groups
data("oliveoil")
ggparcoord(data = oliveoil, columns=c(3:10), scale="uniminmax", alphaLines=0.2) + 
  xlab("") + ylab("")

(a) Some of the observed features in this pcp:

# (b)
ggparcoord(data = oliveoil, columns=c(3:10), scale="uniminmax", 
           groupColumn="region", alphaLines=0.7) + 
  xlab("") + ylab("") 

(b) In this pcp we can find that:

(c) For the scatterplot matrix down below:

While for a pcp:

ggpairs(oliveoil[,c(3:10)], title="Olive acids", diag=list(continuous='density'), axisLabels='none')

Exercise 4. Cars

Solution

data(Cars93, package="MASS")
col_indices = which(names(Cars93)%in%c('Price', 'MPG.city', 'MPG.highway', 'Horsepower', 'RPM', 'Length', 'Width', 'Turn.circle', 'Weight'))
ggparcoord(data = Cars93, columns=col_indices, scale="uniminmax", alphaLines=0.7) + 
  xlab("") + ylab("") 

(a) In this plot we could conclude the following:

(b) I would plot a pcp (down below), and here we could observe some differences between USA and non-USA cars:

# (b)
ggparcoord(data = Cars93, columns=col_indices, scale="uniminmax", 
           groupColumn="Origin", alphaLines=0.7) + 
  xlab("") + ylab("")

(c) Yes, a pcp with uniminmax scaling is informative, since we got to extract some insights from it in (a) and (b). Down below we can find the same pcp but with a standard scale applied (subtracting the mean and dividing by the standard deviation, for each axis) and having its observations categorized by the number of Cylinders. Here we can also see that:

# (c)
ggparcoord(data = Cars93, columns=col_indices,
           groupColumn="Cylinders", alphaLines=0.7) + 
  xlab("") + ylab("") + geom_line(size=0.75)

Exercise 5. Bodyfat

Solution

data(bodyfat, package="MMST")
ggparcoord(data = bodyfat, columns=1:15, scale="uniminmax", alphaLines=0.7) + 
  xlab("") + ylab("") 

(a) There is clearly one outlier which has the maximum value 1.0 on the uniminmax scale in this pcp for many variables (bodyfat, weight, neck, chest, abdomen, hip, thigh, knee, biceps, wrist). Not being the tallest man from the sample (looking at its height), I would say this outlier is not an athlete, but maybe a person with serious obesity problems.

Particularly, there are 2 more outliers on the ankle axis, who might also be outliers on the hip, abdomen and chest measurements.

(b) The height variable looks like it has many points of concentration, like it would be a categorical variable. Maybe the reason why this is happening is that the height was measured only in one decimal instead of using a higher precision. We can quickly check this down below:

# by looking at some of the observations, actually it looks like the data contains estimations to the closest quarter float (0.00/0.25/0.50/0.75).
bodyfat$height[1:10]
##  [1] 67.75 72.25 66.25 72.25 71.25 74.75 69.75 72.50 74.00 73.50

So the reason for this “categorical behaviour” is indeed the low precision of the floating numbers.

(c) As seen in the pcp, as the density increases, the bodyfat decreases, and vice-versa. So there is clearly a negative correlation between those 2 variables. This is also suggested by the intersection of all profiles in (what appears to be) a single point.

(d) Yes, the ordering of the variables can affect the pcp display. I think in this specific dataset we can try ordering the variables after their medians, so we can see all the measurement categories from highest to lowest to make a better idea about our data.

medians = apply(bodyfat[, 1:15], 2, median, na.rm=TRUE)
ordered_medians_indexes = order(medians)
ggparcoord(data = bodyfat, alphaLines=0.3,
           scale="globalminmax", order=ordered_medians_indexes) + coord_flip()

Here we can see for example that the wrist has the smallest measurement out of all the body measurements and that the chest is the largest part of a man’s body.

Exercise 6. Exam marks

Solution

(a) I think a good pcp to present to others would be the following, with globalminmax scale and y limits 0-100, where we can deduce that:

library('SMPracticals')
## Loading required package: ellipse
## 
## Attaching package: 'ellipse'
## The following object is masked from 'package:car':
## 
##     ellipse
## The following object is masked from 'package:graphics':
## 
##     pairs
## 
## Attaching package: 'SMPracticals'
## The following object is masked from 'package:HSAUR2':
## 
##     smoking
## The following object is masked from 'package:GGally':
## 
##     pigs
## The following objects are masked from 'package:MASS':
## 
##     cement, forbes, leuk, shuttle
ggparcoord(data = mathmarks, columns=1:5, scale="globalminmax", alphaLines=0.7) + 
  xlab("") + ylab("") + coord_cartesian(ylim=c(0,100))

We could try to plot a pcp by highlighting the students who scored better on the Vectors exam. In this pcp we can observe that:

mathmarks1 <- within(mathmarks, 
                     vectors_perf <- factor(ifelse(vectors > 60, 'Good', 'Bad')))
ggparcoord(data = mathmarks1, columns=1:5, alphaLines=0.7, scale="globalminmax", 
           groupColumn='vectors_perf') + 
  xlab("") + ylab("") + theme(legend.position = "none") + coord_cartesian(ylim=c(0,100))

# (b)
ggparcoord(data = mathmarks, columns=1:5, scale="globalminmax", alphaLines=0.1, boxplot=TRUE) + 
  xlab("") + ylab("") + coord_cartesian(ylim=c(0,100))

(b) By using the boxplots, we can see easier the maximum, minimum, median values and the outliers for each exam. By looking at this plot, we can’t tell for sure that the Mechanics and Vectors were closed-book exams. We can say for sure that the Statistics exam (an open-book exam) has the lowest median mark out of all the subjects which would contradict somehow the idea of it being an open-book exam.

However, by looking at the minimum value (~3 points) of the Mechanics exam, we can say it is the lowest among all subjects, which confirms the fact it has been a closed-book exam.

Regarding the polygonal lines, I think it is not mandatory to draw then in a boxplot pcp in this specific scenario, even by using the alphaLines option, since it just fills the space without any useful additional information, as it can be seen above.

Exercise 7. Wine

Solution

(a) By using the following pcp’s, which separate the profiles by the wine class (red - Barbera, green - Barolo, blue - Grignolino), we can easily distinguish which variables help us by classifying different wines:

library('gridExtra')
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
data(wine, package="MMST")
a = ggparcoord(wine, columns=1:13, groupColumn="class",scale="uniminmax") + xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("red","grey", "grey")) + coord_flip()
b = ggparcoord(wine, columns=1:13, groupColumn="class",scale="uniminmax") + xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("grey","green", "grey")) + coord_flip()
c = ggparcoord(wine, columns=1:13, groupColumn="class",scale="uniminmax") + xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("grey","grey", "blue")) + coord_flip()
grid.arrange(a, b, c, nrow=1)

(b) There are for sure enough outliers in the data, just by looking for example at the extreme high/low values from the blue pcp at the MalicAcid, Ash, Flav, Hue variables.

(c) Yes, there might be some evidence that these classes have subgroups of wines inside them. Some examples would be:

Exercise 8. Boston housing

Solution

data(Boston, package="MASS")

hcluster = hclust(dist(Boston), method='ward.D2')
clu4 = cutree(hcluster, k=4)
clus = factor(clu4)
boston1 = cbind(Boston, clus)

# (a)
ggparcoord(boston1, columns=1:14, groupColumn="clus", scale="uniminmax") + 
  xlab("") + ylab("")

# (b)
a = ggparcoord(boston1[which(boston1$clus == 1),], columns=1:14, scale="uniminmax",
               mapping=aes(color='#f5aca7')) + 
  xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("#f5aca7"))
b = ggparcoord(boston1[which(boston1$clus == 2),], columns=1:14, scale="uniminmax",
               mapping=aes(color='#a8c75a')) + 
  xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("#a8c75a"))
c = ggparcoord(boston1[which(boston1$clus == 3),], columns=1:14, scale="uniminmax",
               mapping=aes(color='#1ec5c9')) + 
  xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("#1ec5c9"))
d = ggparcoord(boston1[which(boston1$clus == 4),], columns=1:14, scale="uniminmax", 
               mapping=aes(color='#cc8afd')) + 
  xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("#cc8afd"))
grid.arrange(a, b, c, d)

# (c)
a = ggparcoord(rbind(boston1[which(boston1$clus != 1),], boston1[which(boston1$clus == 1),]),
               columns=1:14, groupColumn="clus", scale="uniminmax") +
  xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("#f5aca7","grey", "grey", "grey"))
b = ggparcoord(rbind(boston1[which(boston1$clus != 2),], boston1[which(boston1$clus == 2),]),
               columns=1:14, groupColumn="clus", scale="uniminmax") +
  xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("grey","#a8c75a", "grey", "grey"))
c = ggparcoord(rbind(boston1[which(boston1$clus != 3),], boston1[which(boston1$clus == 3),]),
               columns=1:14, groupColumn="clus", scale="uniminmax") +
  xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("grey","grey", "#1ec5c9", "grey"))
d = ggparcoord(rbind(boston1[which(boston1$clus != 4),], boston1[which(boston1$clus == 4),]),
               columns=1:14, groupColumn="clus", scale="uniminmax") +
  xlab("") + ylab("") +
  theme(legend.position = "none") +
  scale_colour_manual(values = c("grey","grey", "grey", "#cc8afd"))
grid.arrange(a, b, c, d)

The first plot could be useful when not enough space is available, however, when 4 clusters must be displayed, it becomes quickly really hard to distinguish the different categories, the plot becoming a messy mix of colors.

The second way, where we plot each cluster individually, looks much cleaner than the other ones. However, we can observe that the shapes are not the same as in the other two plots. That’s because plotting them individually will cause the minimum and maximum values on each axis to change, restricting them to the cluster domain only. So this way, a uniminmax scale would not be so helpful to make comparisons between clusters, but maybe a globalminmax would be more appropriate for that kind of task.

The third option, which consists of plotting each cluster individually and the other ones in the background, looks like a clean way of visualizing the groups and at the same time comparing their differences. It preserves the axis limits and we can spot much easier the outliers. If I had to choose, this would be the way I would display my clustering results.